port probe in test suite and --cleanup fixes#987
Merged
Conversation
claim_ports() takes a POSIX byte-range lock per port, which serializes concurrent live test runs. But the kernel drops that lock the instant the holding process dies, even if the run left an orphaned rsync --daemon still bound to the port -- which happens when a run is SIGKILLed on a platform with no parent-death backstop (rsyncfns only arms PR_SET_PDEATHSIG, Linux-only, so the BSDs/Solaris/macOS can strand a daemon). A later run then wins the freed lock while the socket is still squatted and dies with a cryptic "bind() failed: Address already in use" / "did not see server greeting". After taking each lock, actually bind the port (SO_REUSEADDR, so a port merely in TIME_WAIT is not a false positive; only a live squatter fails) and close it immediately. On failure stop with an actionable message naming the port and the likely orphaned daemon. Closes the gap that masked the OpenBSD daemon-auth wedge.
… dirs
A run killed without a parent-death backstop can strand a TOCTOU path-flipper
(a busy `python -c` rename loop that pins a CPU) and an orphaned test rsyncd
(--no-detach --address=127.0.0.1) that squats its fixed port -- the wedge the
claim_ports() bind-probe now reports and points at --cleanup. Sweep both, best
effort, before removing the run dirs.
Each sweep counts the pattern, kills it (with a `sudo -n` retry for a process a
root-running test left), then re-counts after a settle: KILLED reports what
actually died, and a process that survives (pkill blocked, no passwordless sudo,
missing/limited pkill) is reported as SURVIVED and fails the run instead of
falsely claiming success.
Run-dir removal falls back to `sudo -n rm` so a dir whose contents a root test
owns is removed instead of failing with "Permission denied" (the failure mode
seen on the ubuntu/mac targets); only a dir that survives even sudo is failed.
The kill patterns use the pgrep self-exclusion trick ('r[e]name', 'det[a]ch')
so they match a real process's "rename"/"detach" but not the literal pattern in
the cleanup shell's own argv -- run_on() passes the whole script as the remote
argv, so without it --cleanup would signal itself. The patterns are host-global
(not scoped to one run), so --cleanup is documented to run between runs, not
during one.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This adds two things to the test suite: